whoami?¶
- My name is Kai
- Involved in the Python ecosystem
- Senior software engineer at Cartesian Software
- Random Fact: I used to organise this meetup
This talks is for YOUR benefit¶
- Feel free to ask questions
- Feel free to interrupt with relevant comments
- Feel free to talk with me afterwards
More than just ML/AI!¶
- Three types of analytics:
- Descriptive: What happened/is happening?
- Predictive: What will happen?
- Prescriptive: What to do?
- Today I want to focus on prescriptive analytics
Mathematical Optimization¶
- Construct a mathematical model of reality and use it to make decisions
- Three major components:
- Decision Variables: The values we can control to find the best solution
- Objective Function: The function that tells us how well we are doing
- Constraints: The rules of the game
Linear Programs¶
- A type of mathematical programming
- Linear Programming the big idea: the objective and all constraints are linear in the decision variables
- We can model many problems by assuming linearity
- Usually quite easy to compute a solution of
Mixed Integer Program¶
- Linear program with the additional constraint that some of the decision variables are integers
- Usually harder to compute
What does linear mean?¶
- $f(x_1, x_2, \ldots, x_n) = c_1x_1 + c_2x_2 + \ldots c_nx_n + b$
- This is technically an affine function, but is considered linear in optimisation
Examples of linear functions¶
- $5x + 3$
- $3x + 7y -5$
- $2x-y+5z$
Examples of non-linear functions¶
- $x^2$
- $xy$
- $\sin(x)$
What does linear look like?¶
- In 2 dimensions a linear equation is a straight line
- In 3 dimensions a linear equation is a plane
- In 4 or more dimenions a linear equations is hyper plane (don't ask me what that looks like)

Today's problem¶
- I like to eat ...
- ... but I don't want to be unhealthy
- How can I eat the most while still meeting my dietary requirements?
Getting today's data¶
In [3]:
columns = [
"Category",
"Item",
"Calories",
]
Dietary Constraints¶
- Want to make sure our diet meets nutritional constraints
- Limit dimensions that are bad for us in excess
- Make sure we get at least enough of others
- Source: https://www.fda.gov/media/99059/download
In [4]:
upper_bound_constraints = {
"Total Fat": 78,
"Saturated Fat": 20,
"Cholesterol": 300,
"Sodium": 2_300,
"Sugars": 50,
}
In [5]:
lower_bound_constraints = {
"Carbohydrates": 275,
"Dietary Fiber": 28,
"Protein": 50,
}
In [6]:
all_cols = (columns
+ list(upper_bound_constraints.keys())
+ list(lower_bound_constraints.keys()))
In [7]:
menu = pd.read_csv(
menu_file_path,
usecols=all_cols,
index_col="Item"
)
In [8]:
# Filter out rows with no calories
# This can't be the case as all nutrients contain some caloric value
menu = menu[menu["Calories"] > 1e-6]
In [9]:
menu.head()
Out[9]:
| Category | Calories | Total Fat | Saturated Fat | Cholesterol | Sodium | Carbohydrates | Dietary Fiber | Sugars | Protein | |
|---|---|---|---|---|---|---|---|---|---|---|
| Item | ||||||||||
| Egg McMuffin | Breakfast | 300 | 13.0 | 5.0 | 260 | 750 | 31 | 4 | 3 | 17 |
| Egg White Delight | Breakfast | 250 | 8.0 | 3.0 | 25 | 770 | 30 | 4 | 3 | 18 |
| Sausage McMuffin | Breakfast | 370 | 23.0 | 8.0 | 45 | 780 | 29 | 4 | 2 | 14 |
| Sausage McMuffin with Egg | Breakfast | 450 | 28.0 | 10.0 | 285 | 860 | 30 | 4 | 2 | 21 |
| Sausage McMuffin with Egg Whites | Breakfast | 400 | 23.0 | 8.0 | 50 | 880 | 30 | 4 | 2 | 21 |
Simplifying the problem¶
In [10]:
len(menu)
Out[10]:
244
- We are working in 244 dimensions!
- I can't visualise 244 dimensions :(
- Let's reduce it to 2
In [11]:
restricted_menu = menu.loc[["Big Mac", "Side Salad"]]
In [12]:
restricted_menu.head()
Out[12]:
| Category | Calories | Total Fat | Saturated Fat | Cholesterol | Sodium | Carbohydrates | Dietary Fiber | Sugars | Protein | |
|---|---|---|---|---|---|---|---|---|---|---|
| Item | ||||||||||
| Big Mac | Beef & Pork | 530 | 27.0 | 10.0 | 85 | 960 | 47 | 3 | 9 | 24 |
| Side Salad | Snacks & Sides | 20 | 0.0 | 0.0 | 0 | 10 | 4 | 1 | 2 | 1 |
Formulating our model¶
- Almost all of the time linear programs are written in standard form
- This is represented as: $$ \begin{align} \text{max} \quad f = c^T&x \\ \text{subject to} \quad A&x \le b \\ &x \ge 0 \end{align}$$
So many letters...¶
$$ \begin{align} \text{max} \quad f = c^T&x \\ \text{subject to} \quad A&x \le b \\ &x \ge 0 \end{align}$$
- $f$ is the objective function
- $x$ is a vector of decision variables
- $c$ is the cost associated with a unit increase in $x$
- $A$ is a matrix of constraints (aka, how much does each decision variable contribute to that constraint?)
- $b$ is a vector of limits of the constraint
$x$, the decision variables¶
- What decisions can we make?
- Assign one variable per thing we can change
- let $x_1$ = number of Big Macs eaten
- let $x_2$ = number of side salads eaten
Objective function¶
- Function of the decision variables
- Minimise or Maximise
- composed of two things, the cost and the decision variables
- In the example:
- $x_1$ "costs" 530 calories
- $x_2$ "costs" 20 calories
- $\implies \text{maximise f} = 530x_1 + 20x_2$
The dietary constraints:¶
- At most 78g of total fat $\implies 27x_1 + 0x_2 \le 78$
- At most 20g of saturated fat $\implies 10x_1 + 0x_2 \le 20$
- At most 300mg of cholestrol $\implies 85x_1 + 0x_2 \le 300$
- At most 2300mg of sodium $\implies 960x_1 + 10x_2 \le 2300$
- At most 50g of sugars $\implies 9x_1 + 2x_2 \le 50$
- At least 275g of carbs $\implies 47x_1 + 4x_2 \ge 275$
- At least 28g of dietary fibre $\implies 3x_1 + 1x_2 \ge 28$
- At least 50g of protein $\implies 24x_1 + 1x_2 \ge 50$
Non negativity constraints¶
$x_1, x_2 \ge 0$
What does this look like?¶
- Let's visualise this in Python
- Note that the program is not in standard form
- Will ignore $\ge$ constraints for now.
- This is only to introduce our intuition of the problem
In [13]:
c = restricted_menu["Calories"].values
A = restricted_menu[list(upper_bound_constraints.keys())].T.values
b = np.array(list(upper_bound_constraints.values()), dtype=float)
lp = LP(A, b, c)
simplex_visual(lp).show()
Expanding our model¶
- Visualising models works in 2 or 3 dimensions
- Our full model has 244 items (or decision variables)
- That means we're working in 244 dimensions - which I can't visualise :(
The role of solvers¶
- Solvers help formulate and find solutions to optimisation problems
- Help with scaling problems to several orders of magnitude higher than what we're doing today
- There are many solvers:
- Free and Open Source: HiGHs, GLPK, LP_Solve, OR-Tools
- Commerical: Gurobi, CPLEX
What solver are we using today?¶
- We will be using the HiGHs solver
- This is a Free and Open Source solver
- It means you can play with this notebook whenever you want

The role of pyomo¶
- We're using another package; pyomo
- pyomo provides a unified way of formulating optimisation problems with different solvers
- This means we can change solvers without having to reformulate the whole program
In [14]:
m = pyo.ConcreteModel()
In [15]:
solver = pyo.SolverFactory("appsi_highs")
In [16]:
m.x = pyo.Var(
[x for x in menu.index],
domain=pyo.NonNegativeIntegers,
)
In [17]:
m.obj_func = pyo.Objective(
expr=pyo.sum_product(menu["Calories"], m.x),
sense=pyo.maximize,
)
In [19]:
for name, constraint_value in upper_bound_constraints.items():
constraint_name = f"constraint_{prettify(name)}"
expr = pyo.sum_product(menu[name], m.x) <= constraint_value
constraint = pyo.Constraint(expr=expr)
setattr(m, constraint_name, constraint)
In [20]:
for name, constraint_value in lower_bound_constraints.items():
constraint_name = f"constraint_{prettify(name)}"
expr = pyo.sum_product(menu[name], m.x) >= constraint_value
constraint = pyo.Constraint(expr=expr)
setattr(m, constraint_name, constraint)
In [21]:
_ = solver.solve(m)
In [22]:
m.obj_func()
Out[22]:
2300.0
In [24]:
get_basic_variables(m)
Out[24]:
{'Honey Mustard Snack Wrap (Grilled Chicken)': 2.0,
'Medium French Fries': 1.9999999999999039,
'Kids French Fries': 6.000000000000094,
'Side Salad': 22.999999999999176}
In [25]:
m.constraint_at_most_5_per_item = pyo.ConstraintList()
for index in menu.index:
m.constraint_at_most_5_per_item.add(m.x[index] <= 5)
In [26]:
_ = solver.solve(m)
In [27]:
m.obj_func()
Out[27]:
2260.0
In [28]:
get_basic_variables(m)
Out[28]:
{'Fruit & Maple Oatmeal without Brown Sugar': 2.0,
'Honey Mustard Snack Wrap (Grilled Chicken)': 2.0,
'Small French Fries': 3.0,
'Medium French Fries': 1.0,
'Kids French Fries': 1.0,
'Side Salad': 5.0}
In [29]:
def constrain_category_by_calories(model: pyo.ConcreteModel, df: pd.DataFrame, min_calories: int, category: str) -> None:
constraint_name = f"constraint_min_cals_from_category_{prettify(category)}"
indecies = df[df["Category"] == category].index
expr = pyo.sum_product(df["Calories"], model.x, index=indecies) >= min_calories
setattr(model, constraint_name, pyo.Constraint(expr=expr))
In [30]:
constrain_category_by_calories(m, menu, 300, "Breakfast")
constrain_category_by_calories(m, menu, 500, "Beef & Pork")
In [31]:
_ = solver.solve(m)
In [32]:
get_basic_variables(m)
Out[32]:
{'Hash Brown': 1.0,
'Fruit & Maple Oatmeal without Brown Sugar': 1.0,
'Hamburger': 1.0,
'Cheeseburger': 1.0,
'Premium Southwest Salad (without Chicken)': 1.0,
'Medium French Fries': 1.000000000000105,
'Large French Fries': 1.0,
'Side Salad': 5.0,
'Apple Slices': 1.0}
In [33]:
m.obj_func()
Out[33]:
2045.0000000000357
Can we help you?¶
- Endgame Analytics and Cartesian Software are optimisation experts
- If you have:
- the data: do you know the rules of the game?
- an optimisation problem: do you want to make something as efficient as possible?
- a valuable decision: the more complex the decision space the more valuable optimisation becomes
- Let's have a chat!